India Road Accident (2021-22) Data Analysis

image.png

image.png

image.png

image.png

An Introduction to the Dataset

Road Accident Dataset for India (2021-22)

This dataset provides detailed information about road accidents in India for the years 2021 and 2022. The dataset contains various attributes that has been used for exploratory data analysis (EDA) in Python to uncover patterns, trends, and insights related to road safety.This comprehensive dataset is ideal for performing exploratory data analysis to identify key factors contributing to road accidents, assess the impact of different conditions on accident severity, and develop strategies for improving road safety.

Importing necessary libraries

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import plotly.express as px
import datetime
import warnings
warnings.filterwarnings("ignore")
sns.set()

Read the Accident dataset

df = pd.read_excel('Road Accident India 2021-22.xlsx')
df_states = pd.read_csv('State wise Accidents data.csv')

Data Overview

df.head()
Accident_Index Accident Date Month Year Day_of_Week Junction_Control Junction_Detail Accident_Severity Latitude Light_Conditions ... Number_of_Casualties Number_of_Vehicles Police_Force Road_Surface_Conditions Road_Type Speed_limit Time Urban_or_Rural_Area Weather_Conditions Vehicle_Type
0 200901BS70001 2021-01-01 Jan 2021 Thursday Give way or uncontrolled T or staggered junction Serious 51.512273 Daylight ... 1 2 Metropolitan Police Dry One way street 30 15:11:00 Urban Fine no high winds Car
1 200901BS70002 2021-01-05 Jan 2021 Monday Give way or uncontrolled Crossroads Serious 51.514399 Daylight ... 11 2 Metropolitan Police Wet or damp Single carriageway 30 10:59:00 Urban Fine no high winds Taxi/Private hire car
2 200901BS70003 2021-01-04 Jan 2021 Sunday Give way or uncontrolled T or staggered junction Slight 51.486668 Daylight ... 1 2 Metropolitan Police Dry Single carriageway 30 14:19:00 Urban Fine no high winds Taxi/Private hire car
3 200901BS70004 2021-01-05 Jan 2021 Monday Auto traffic signal T or staggered junction Serious 51.507804 Daylight ... 1 2 Metropolitan Police Frost or ice Single carriageway 30 08:10:00 Urban Other Motorcycle over 500cc
4 200901BS70005 2021-01-06 Jan 2021 Tuesday Auto traffic signal Crossroads Serious 51.482076 Darkness - lights lit ... 1 2 Metropolitan Police Dry Single carriageway 30 17:25:00 Urban Fine no high winds Car

5 rows × 23 columns

df_states
_id State/UT/City Dangerous or Careless Driving/ Overtaking etc Cases Dangerous or Careless Driving/ Overtaking etc Injured Dangerous or Careless Driving/ Overtaking etc Died Overspeeding Cases Overspeeding Injured Overspeeding Died Driving under Influence of Drug/Alcohol Cases Driving under Influence of Drug/Alcohol Injured ... Vehicles Parking at Road Shoulders Died Causes Not Known Cases Causes Not Known Injured Causes Not Known Died Other Causes Cases Other Causes Injured Other Causes Died Total Road Accidents Cases Total Road Accidents Injured Total Road Accidents Died
0 1 ANDHRA PRADESH 2185 2271 755 16631 16188 6371 119 64 ... 18.0 121.0 119.0 32.0 2129.0 1957.0 817.0 21556.0 21040.0 8186.0
1 2 ARUNACHAL PRADESH 65 59 40 120 127 74 3 6 ... 0.0 9.0 4.0 7.0 38.0 37.0 28.0 261.0 266.0 173.0
2 3 ASSAM 886 833 347 4303 3237 1946 288 201 ... 45.0 42.0 0.0 10.0 89.0 95.0 21.0 7069.0 5420.0 3014.0
3 4 BIHAR 5039 4134 4071 2886 2348 2284 51 53 ... 95.0 20.0 12.0 22.0 101.0 70.0 77.0 9553.0 7946.0 7660.0
4 5 CHHATTISGARH 3536 3258 1750 6378 5603 2723 145 159 ... 71.0 455.0 220.0 258.0 1163.0 917.0 445.0 12395.0 10682.0 5413.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
88 89 VARANASI 41 26 60 36 14 35 9 3 ... 3.0 25.0 26.0 34.0 0.0 0.0 0.0 133.0 86.0 145.0
89 90 VASAI VIRAR 70 62 22 276 191 125 0 0 ... 0.0 1.0 0.0 1.0 5.0 3.0 2.0 352.0 256.0 150.0
90 91 VIJAYAWADA 124 129 20 1101 952 267 3 0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1228.0 1081.0 287.0
91 92 VISHAKHAPATNAM 31 26 6 1785 1166 261 41 7 ... 0.0 0.0 0.0 0.0 460.0 313.0 95.0 2339.0 1533.0 368.0
92 93 TOTAL (CITIES) 14335 12062 3885 31753 27448 7415 1137 843 ... 77.0 1441.0 1228.0 328.0 4578.0 4344.0 778.0 55442.0 47523.0 13384.0

93 rows × 44 columns

#Dimensions
df.shape
(307973, 23)
df_states.shape
(93, 44)
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 307973 entries, 0 to 307972
Data columns (total 23 columns):
 #   Column                      Non-Null Count   Dtype         
---  ------                      --------------   -----         
 0   Accident_Index              307973 non-null  object        
 1   Accident Date               307973 non-null  datetime64[ns]
 2   Month                       307973 non-null  object        
 3   Year                        307973 non-null  int64         
 4   Day_of_Week                 307973 non-null  object        
 5   Junction_Control            307973 non-null  object        
 6   Junction_Detail             307973 non-null  object        
 7   Accident_Severity           307973 non-null  object        
 8   Latitude                    307973 non-null  float64       
 9   Light_Conditions            307973 non-null  object        
 10  Local_Authority_(District)  307973 non-null  object        
 11  Carriageway_Hazards         5424 non-null    object        
 12  Longitude                   307973 non-null  float64       
 13  Number_of_Casualties        307973 non-null  int64         
 14  Number_of_Vehicles          307973 non-null  int64         
 15  Police_Force                307973 non-null  object        
 16  Road_Surface_Conditions     307656 non-null  object        
 17  Road_Type                   306439 non-null  object        
 18  Speed_limit                 307973 non-null  int64         
 19  Time                        307956 non-null  object        
 20  Urban_or_Rural_Area         307973 non-null  object        
 21  Weather_Conditions          301916 non-null  object        
 22  Vehicle_Type                307973 non-null  object        
dtypes: datetime64[ns](1), float64(2), int64(4), object(16)
memory usage: 54.0+ MB
df_states.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 93 entries, 0 to 92
Data columns (total 44 columns):
 #   Column                                                 Non-Null Count  Dtype  
---  ------                                                 --------------  -----  
 0   _id                                                    93 non-null     int64  
 1   State/UT/City                                          93 non-null     object 
 2   Dangerous or Careless Driving/ Overtaking etc Cases    93 non-null     int64  
 3   Dangerous or Careless Driving/ Overtaking etc Injured  93 non-null     int64  
 4   Dangerous or Careless Driving/ Overtaking etc Died     93 non-null     int64  
 5   Overspeeding Cases                                     93 non-null     int64  
 6   Overspeeding Injured                                   93 non-null     int64  
 7   Overspeeding Died                                      93 non-null     int64  
 8   Driving under Influence of Drug/Alcohol Cases          93 non-null     int64  
 9   Driving under Influence of Drug/Alcohol Injured        93 non-null     int64  
 10  Driving under Influence of Drug/Alcohol Died           93 non-null     int64  
 11  Physical Fatigue of Drivers Cases                      93 non-null     int64  
 12  Physical Fatigue of Drivers Injured                    93 non-null     int64  
 13  Physical Fatigue of Drivers Died                       93 non-null     int64  
 14  Defect in Mechanical Condition of Vehicle Cases        93 non-null     int64  
 15  Defect in Mechanical Condition of Vehicle Injured      93 non-null     int64  
 16  Defect in Mechanical Condition of Vehicle Died         93 non-null     int64  
 17  Animal Crossing Cases                                  93 non-null     int64  
 18  Animal Crossing Injured                                93 non-null     int64  
 19  Animal Crossing Died                                   93 non-null     int64  
 20  Weather Condition (Total) Cases                        93 non-null     int64  
 21  Weather Condition (Total) Injured                      93 non-null     int64  
 22  Weather Condition (Total) Died                         93 non-null     int64  
 23  Weather Condition (Poor Visibility) Cases              93 non-null     int64  
 24  Weather Condition (Poor Visibility) Injured            93 non-null     int64  
 25  Weather Condition (Poor Visibility) Died               93 non-null     int64  
 26  Weather Condition (Other Causes) Cases                 93 non-null     int64  
 27  Weather Condition (Other Causes) Injured               93 non-null     int64  
 28  Weather Condition (Other Causes) Died                  93 non-null     int64  
 29  Lack of Road Infrastructure Cases                      92 non-null     float64
 30  Lack of Road Infrastructure Injured                    92 non-null     float64
 31  Lack of Road Infrastructure Died                       92 non-null     float64
 32  Vehicles Parking at Road Shoulders Cases               92 non-null     float64
 33  Vehicles Parking at Road Shoulders Injured             92 non-null     float64
 34  Vehicles Parking at Road Shoulders Died                92 non-null     float64
 35  Causes Not Known Cases                                 92 non-null     float64
 36  Causes Not Known Injured                               92 non-null     float64
 37  Causes Not Known Died                                  92 non-null     float64
 38  Other Causes Cases                                     92 non-null     float64
 39  Other Causes Injured                                   92 non-null     float64
 40  Other Causes Died                                      92 non-null     float64
 41  Total Road Accidents Cases                             92 non-null     float64
 42  Total Road Accidents Injured                           92 non-null     float64
 43  Total Road Accidents Died                              92 non-null     float64
dtypes: float64(15), int64(28), object(1)
memory usage: 32.1+ KB
#Check Null Values
df.isnull().sum()
Accident_Index                     0
Accident Date                      0
Month                              0
Year                               0
Day_of_Week                        0
Junction_Control                   0
Junction_Detail                    0
Accident_Severity                  0
Latitude                           0
Light_Conditions                   0
Local_Authority_(District)         0
Carriageway_Hazards           302549
Longitude                          0
Number_of_Casualties               0
Number_of_Vehicles                 0
Police_Force                       0
Road_Surface_Conditions          317
Road_Type                       1534
Speed_limit                        0
Time                              17
Urban_or_Rural_Area                0
Weather_Conditions              6057
Vehicle_Type                       0
dtype: int64
df_states.isnull().sum()
_id                                                      0
State/UT/City                                            0
Dangerous or Careless Driving/ Overtaking etc Cases      0
Dangerous or Careless Driving/ Overtaking etc Injured    0
Dangerous or Careless Driving/ Overtaking etc Died       0
Overspeeding Cases                                       0
Overspeeding Injured                                     0
Overspeeding Died                                        0
Driving under Influence of Drug/Alcohol Cases            0
Driving under Influence of Drug/Alcohol Injured          0
Driving under Influence of Drug/Alcohol Died             0
Physical Fatigue of Drivers Cases                        0
Physical Fatigue of Drivers Injured                      0
Physical Fatigue of Drivers Died                         0
Defect in Mechanical Condition of Vehicle Cases          0
Defect in Mechanical Condition of Vehicle Injured        0
Defect in Mechanical Condition of Vehicle Died           0
Animal Crossing Cases                                    0
Animal Crossing Injured                                  0
Animal Crossing Died                                     0
Weather Condition (Total) Cases                          0
Weather Condition (Total) Injured                        0
Weather Condition (Total) Died                           0
Weather Condition (Poor Visibility) Cases                0
Weather Condition (Poor Visibility) Injured              0
Weather Condition (Poor Visibility) Died                 0
Weather Condition (Other Causes) Cases                   0
Weather Condition (Other Causes) Injured                 0
Weather Condition (Other Causes) Died                    0
Lack of Road Infrastructure Cases                        1
Lack of Road Infrastructure Injured                      1
Lack of Road Infrastructure Died                         1
Vehicles Parking at Road Shoulders Cases                 1
Vehicles Parking at Road Shoulders Injured               1
Vehicles Parking at Road Shoulders Died                  1
Causes Not Known Cases                                   1
Causes Not Known Injured                                 1
Causes Not Known Died                                    1
Other Causes Cases                                       1
Other Causes Injured                                     1
Other Causes Died                                        1
Total Road Accidents Cases                               1
Total Road Accidents Injured                             1
Total Road Accidents Died                                1
dtype: int64
# Dropping 'Carriageway_Hazards' column 
df.drop('Carriageway_Hazards', axis = 1, inplace = True)

Data Cleaning

Renaming Columns

#Renaming Columns
# Renaming columns
df_states.rename(columns={
    '_id': 'id',
    'State/UT/City': 'state_city',
    'Dangerous or Careless Driving/ Overtaking etc Cases': 'dangerous_driving_cases',
    'Dangerous or Careless Driving/ Overtaking etc Injured': 'dangerous_driving_injured',
    'Dangerous or Careless Driving/ Overtaking etc Died': 'dangerous_driving_died',
    'Overspeeding Cases': 'overspeeding_cases',
    'Overspeeding Injured': 'overspeeding_injured',
    'Overspeeding Died': 'overspeeding_died',
    'Driving under Influence of Drug/Alcohol Cases': 'drunk_driving_cases',
    'Driving under Influence of Drug/Alcohol Injured': 'drunk_driving_injured',
    'Driving under Influence of Drug/Alcohol Died': 'drunk_driving_died',
    'Physical Fatigue of Drivers Cases': 'fatigue_cases',
    'Physical Fatigue of Drivers Injured': 'fatigue_injured',
    'Physical Fatigue of Drivers Died': 'fatigue_died',
    'Defect in Mechanical Condition of Vehicle Cases': 'mechanical_defect_cases',
    'Defect in Mechanical Condition of Vehicle Injured': 'mechanical_defect_injured',
    'Defect in Mechanical Condition of Vehicle Died': 'mechanical_defect_died',
    'Animal Crossing Cases': 'animal_crossing_cases',
    'Animal Crossing Injured': 'animal_crossing_injured',
    'Animal Crossing Died': 'animal_crossing_died',
    'Weather Condition (Total) Cases': 'weather_total_cases',
    'Weather Condition (Total) Injured': 'weather_total_injured',
    'Weather Condition (Total) Died': 'weather_total_died',
    'Weather Condition (Poor Visibility) Cases': 'poor_visibility_cases',
    'Weather Condition (Poor Visibility) Injured': 'poor_visibility_injured',
    'Weather Condition (Poor Visibility) Died': 'poor_visibility_died',
    'Weather Condition (Other Causes) Cases': 'weather_other_cases',
    'Weather Condition (Other Causes) Injured': 'weather_other_injured',
    'Weather Condition (Other Causes) Died': 'weather_other_died',
    'Lack of Road Infrastructure Cases': 'road_infrastructure_cases',
    'Lack of Road Infrastructure Injured': 'road_infrastructure_injured',
    'Lack of Road Infrastructure Died': 'road_infrastructure_died',
    'Vehicles Parking at Road Shoulders Cases': 'parking_shoulder_cases',
    'Vehicles Parking at Road Shoulders Injured': 'parking_shoulder_injured',
    'Vehicles Parking at Road Shoulders Died': 'parking_shoulder_died',
    'Causes Not Known Cases': 'unknown_causes_cases',
    'Causes Not Known Injured': 'unknown_causes_injured',
    'Causes Not Known Died': 'unknown_causes_died',
    'Other Causes Cases': 'other_causes_cases',
    'Other Causes Injured': 'other_causes_injured',
    'Other Causes Died': 'other_causes_died',
    'Total Road Accidents Cases': 'total_accidents_cases',
    'Total Road Accidents Injured': 'total_accidents_injured',
    'Total Road Accidents Died': 'total_accidents_died'
}, inplace=True)
df_states.head()
id state_city dangerous_driving_cases dangerous_driving_injured dangerous_driving_died overspeeding_cases overspeeding_injured overspeeding_died drunk_driving_cases drunk_driving_injured ... parking_shoulder_died unknown_causes_cases unknown_causes_injured unknown_causes_died other_causes_cases other_causes_injured other_causes_died total_accidents_cases total_accidents_injured total_accidents_died
0 1 ANDHRA PRADESH 2185 2271 755 16631 16188 6371 119 64 ... 18.0 121.0 119.0 32.0 2129.0 1957.0 817.0 21556.0 21040.0 8186.0
1 2 ARUNACHAL PRADESH 65 59 40 120 127 74 3 6 ... 0.0 9.0 4.0 7.0 38.0 37.0 28.0 261.0 266.0 173.0
2 3 ASSAM 886 833 347 4303 3237 1946 288 201 ... 45.0 42.0 0.0 10.0 89.0 95.0 21.0 7069.0 5420.0 3014.0
3 4 BIHAR 5039 4134 4071 2886 2348 2284 51 53 ... 95.0 20.0 12.0 22.0 101.0 70.0 77.0 9553.0 7946.0 7660.0
4 5 CHHATTISGARH 3536 3258 1750 6378 5603 2723 145 159 ... 71.0 455.0 220.0 258.0 1163.0 917.0 445.0 12395.0 10682.0 5413.0

5 rows × 44 columns

# Imputing numerical columns with mean
numerical_columns = df_states.select_dtypes(include=['float64'])
numerical_columns = numerical_columns.columns[numerical_columns.isnull().any()]
df_states[numerical_columns] = df_states[numerical_columns].fillna(df_states[numerical_columns].mean())

missing_counts = df_states.isnull().sum()
print(missing_counts)
id                             0
state_city                     0
dangerous_driving_cases        0
dangerous_driving_injured      0
dangerous_driving_died         0
overspeeding_cases             0
overspeeding_injured           0
overspeeding_died              0
drunk_driving_cases            0
drunk_driving_injured          0
drunk_driving_died             0
fatigue_cases                  0
fatigue_injured                0
fatigue_died                   0
mechanical_defect_cases        0
mechanical_defect_injured      0
mechanical_defect_died         0
animal_crossing_cases          0
animal_crossing_injured        0
animal_crossing_died           0
weather_total_cases            0
weather_total_injured          0
weather_total_died             0
poor_visibility_cases          0
poor_visibility_injured        0
poor_visibility_died           0
weather_other_cases            0
weather_other_injured          0
weather_other_died             0
road_infrastructure_cases      0
road_infrastructure_injured    0
road_infrastructure_died       0
parking_shoulder_cases         0
parking_shoulder_injured       0
parking_shoulder_died          0
unknown_causes_cases           0
unknown_causes_injured         0
unknown_causes_died            0
other_causes_cases             0
other_causes_injured           0
other_causes_died              0
total_accidents_cases          0
total_accidents_injured        0
total_accidents_died           0
dtype: int64

Handling Missing Values

# Imputing categorical columns with mode
categorical_columns = df.select_dtypes(include=['object'])
categorical_columns = categorical_columns.columns[categorical_columns.isnull().any()]
df[categorical_columns] = df[categorical_columns].fillna(df[categorical_columns].mode().iloc[0])

# Verifying if all missing values are handled
missing_counts = df.isnull().sum()
print(missing_counts)
Accident_Index                0
Accident Date                 0
Month                         0
Year                          0
Day_of_Week                   0
Junction_Control              0
Junction_Detail               0
Accident_Severity             0
Latitude                      0
Light_Conditions              0
Local_Authority_(District)    0
Longitude                     0
Number_of_Casualties          0
Number_of_Vehicles            0
Police_Force                  0
Road_Surface_Conditions       0
Road_Type                     0
Speed_limit                   0
Time                          0
Urban_or_Rural_Area           0
Weather_Conditions            0
Vehicle_Type                  0
dtype: int64
#Statistical Overview
df_num = df.select_dtypes(include = ['int', 'float'])
df_num.describe()
Year Latitude Longitude Number_of_Casualties Number_of_Vehicles Speed_limit
count 307973.000000 307973.000000 307973.000000 307973.000000 307973.000000 307973.000000
mean 2021.468934 52.487005 -1.368884 1.356882 1.829063 38.866037
std 0.499035 1.339011 1.356092 0.815857 0.710477 14.032933
min 2021.000000 49.914488 -7.516225 1.000000 1.000000 10.000000
25% 2021.000000 51.485248 -2.247937 1.000000 1.000000 30.000000
50% 2021.000000 52.225943 -1.349258 1.000000 2.000000 30.000000
75% 2022.000000 53.415517 -0.206810 1.000000 2.000000 50.000000
max 2022.000000 60.598055 1.759398 48.000000 32.000000 70.000000

Now the Data is clean !

Highlights of the Report

Demographic Impact:

Young adults in the age group of 18 - 45 years accounted for 66.5% of the victims in 2022. Additionally, people in the working age group of 18 – 60 years constituted 83.4% of the total road accident fatalities.

Road-User Categories:

Among road-user categories, two-wheeler riders had the highest share in total fatalities, representing 44.5% of persons killed in road accidents in 2022. Pedestrian road-users were the second-largest group, with 19.5% of fatalities.

International Comparison:

India has the highest number of total persons killed due to road accidents, followed by China and the United States. Venezuela has the highest rate of persons killed per 1,00,000 population.

📌 Total Casualties took place after the accident

Number of Road Accidents:

In 2022, a total of 4,17,883 road accidents occurred in India, leading to 1,68,491 fatalities and 4,43,366 people injured. These figures represent an 11.9% year-on-year increase in accidents, a 9.4% rise in fatalities, and a substantial 15.3% surge in the number of people injured compared to the previous year.

print('Total Casualties took place after the accident is : ',df['Number_of_Casualties'].sum())
Total Casualties took place after the accident is :  417883

📌 Total Casualties & percentage of total with respect to accident severity and maximum casualties by type of vehicle

image-2.png

Road Accident Distribution:

32.9% of accidents took place on National Highways and Expressways, 23.1% on State Highways, and the remaining 43.9% on other roads. 36.2% of fatalities occurred on National Highways, 24.3% on State Highways, and 39.4% on other roads.

df1 = df['Accident_Severity'].value_counts()
df1
Accident_Severity
Slight     263280
Serious     40740
Fatal        3953
Name: count, dtype: int64
sns.barplot(x = df1.index, y = df1.values, palette = 'bright')
plt.title('Total Casualties respect to accident severity')
plt.xlabel('Severity')
plt.ylabel('Total Casualty')
plt.show()

df2 = df['Accident_Severity'].value_counts(normalize = True)
df2
Accident_Severity
Slight     0.854880
Serious    0.132284
Fatal      0.012836
Name: proportion, dtype: float64
plt.figure(figsize=(6,6))
colors = sns.color_palette('bright')
df2.plot(kind = 'pie', autopct = '%2.2f%%', colors = colors, startangle = 70)
plt.ylabel(None)
plt.show()

📌 Total Casualties with respect to vehicle type

Vehicle Categories:

Two-wheelers, for the second consecutive year, accounted for the highest share in both total accidents and fatalities in 2022. Light vehicles, including cars, jeeps, and taxis, ranked a distant second.

df3 = df['Vehicle_Type'].value_counts().sort_values(ascending = False)
df3
Vehicle_Type
Car                                      239794
Van / Goods 3.5 tonnes mgw or under       15695
Motorcycle over 500cc                     11226
Bus or coach (17 or more pass seats)       8686
Motorcycle 125cc and under                 6852
Goods 7.5 tonnes mgw and over              6532
Taxi/Private hire car                      5543
Motorcycle 50cc and under                  3703
Motorcycle over 125cc and up to 500cc      3285
Other vehicle                              2516
Goods over 3.5t. and under 7.5t            2502
Minibus (8 - 16 passenger seats)            821
Agricultural vehicle                        749
Pedal cycle                                  66
Ridden horse                                  3
Name: count, dtype: int64

So, Car constitutes highest no of accidents -> 239794

Excluding Car

df_nocar = df[df['Vehicle_Type'] != 'Car']
df_11 = df_nocar['Vehicle_Type'].value_counts().sort_values(ascending = False).head(8)
sns.barplot(x = df_11.values, y = df_11.index, palette = 'viridis')
plt.title('Total Casualties with respect to vehicle type')
plt.show()

df4 = df[df['Year']== 2021]['Month'].value_counts()
df5 = df[df['Year']== 2022]['Month'].value_counts()
plt.figure(figsize=(10, 6))
plt.plot(df4.index, df4.values, marker='o', label='2021')
plt.plot(df5.index, df5.values, marker='o', label='2022')
plt.xlabel('Month')
plt.ylabel('Count')
plt.title('Monthly Accidents Comparison for 2021 and 2022')
plt.legend()
plt.grid(True)
plt.show()

📌 Maximum Casualties by Road type

df6 = df['Road_Type'].value_counts()
df6
Road_Type
Single carriageway    230612
Dual carriageway       45467
Roundabout             20929
One way street          6197
Slip road               3234
Name: count, dtype: int64
plt.figure(figsize=(10,6))
sns.barplot(x = df6.index, y = df6.values, palette = 'magma')
plt.xlabel('Road Type')
plt.ylabel('Casualties')
plt.title('Casualties by Road type')
plt.show()

📌 Relation between Casualties by Area/ Location & by Day/ Night

grouped_data = df.groupby(['Urban_or_Rural_Area','Light_Conditions'])['Number_of_Casualties'].sum().reset_index()
grouped_data
Urban_or_Rural_Area Light_Conditions Number_of_Casualties
0 Rural Darkness - lighting unknown 1620
1 Rural Darkness - lights lit 16724
2 Rural Darkness - lights unlit 658
3 Rural Darkness - no lighting 24215
4 Rural Daylight 118802
5 Urban Darkness - lighting unknown 2209
6 Urban Darkness - lights lit 65443
7 Urban Darkness - lights unlit 880
8 Urban Darkness - no lighting 1171
9 Urban Daylight 186161
pivot_table = grouped_data.pivot_table(index='Urban_or_Rural_Area', columns='Light_Conditions', values='Number_of_Casualties', fill_value=0)
pivot_table
Light_Conditions Darkness - lighting unknown Darkness - lights lit Darkness - lights unlit Darkness - no lighting Daylight
Urban_or_Rural_Area
Rural 1620.0 16724.0 658.0 24215.0 118802.0
Urban 2209.0 65443.0 880.0 1171.0 186161.0
plt.figure(figsize=(12,6))
sns.barplot(x = 'Urban_or_Rural_Area' , y = 'Number_of_Casualties', hue = 'Light_Conditions', data = grouped_data, palette = 'viridis' )
plt.xlabel('Region')
plt.ylabel('Casualties')
plt.title('Casualties by Area/Light Conditions')
plt.show()

📌 Speed Limit vs No of casualties

plt.figure(figsize=(12,6))
sns.lineplot(x = 'Speed_limit', y = 'Number_of_Casualties', data = df)
plt.xlabel('Spped Limit')
plt.ylabel('Casualties')
plt.title('Casualties by Speed limit')
plt.grid(True)
plt.show()

📌 No of Vehicles vs No of Casualties

plt.figure(figsize=(12,6))
sns.lineplot(x = 'Number_of_Vehicles', y = 'Number_of_Casualties', data = df)
plt.xlabel('No of vehicles')
plt.ylabel('Casualties')
plt.title('Casualties by No of vehicles')
plt.grid(True)
plt.show()

📌 Casualties by Day of the week

df7 = df['Day_of_Week'].value_counts()

plt.figure(figsize=(12, 5))
sns.barplot(x = df7.index, y = df7.values, palette = 'gnuplot_d')
plt.xlabel('Day of the Week')
plt.ylabel('Number of Accidents')
plt.title('Number of Accidents by Day of the Week')
plt.grid(True)
plt.show()

📌 URBAN VS RURAL Casualties

Rural vs. Urban Accidents:

About 68% of road accident deaths occurred in rural areas, with urban areas contributing 32% to the total accident deaths in the country.

df['Urban_or_Rural_Area'].value_counts()
Urban_or_Rural_Area
Urban    198532
Rural    109441
Name: count, dtype: int64
# Group by Urban/Rural Area and Accident Severity
severity_counts = df.groupby(['Urban_or_Rural_Area', 'Road_Surface_Conditions']).size().unstack()

# Plot stacked column chart
severity_counts.plot(kind='bar', stacked=True, figsize=(8, 6))
plt.xlabel('Urban or Rural Area')
plt.ylabel('Number of Accidents')
plt.title('Distribution of Accidents by Road Surface Conditions and Urban/Rural Area')
plt.xticks(rotation=0)
plt.legend(title='Road Surface Conditions')
plt.show()

State-Specific Data:

Tamil Nadu recorded the highest number of road accidents in 2022, with 13.9% of the total accidents, followed by Madhya Pradesh with 11.8%. Uttar Pradesh had the highest number of fatalities due to road accidents (13.4%), followed by Tamil Nadu (10.6%). Understanding state-specific trends is essential for targeted interventions.

📌 Total Accidensts, Injured and casualties on States UT

df8 = df_states[(df_states['state_city'] != 'TOTAL (STATES)') & (df_states['state_city'] !='TOTAL (CITIES)')]
df_cases = df8.groupby('state_city')['total_accidents_cases'].sum().sort_values(ascending = False).head(10)
plt.figure(figsize=(12, 5))
sns.barplot(x = df_cases.index, y =df_cases.values, palette = 'magma')
plt.xlabel('Statea and UT')
plt.ylabel('Total Accident Cases')
plt.title('Number of Accidents by States')
plt.xticks(rotation = 90)
plt.grid(True)
plt.show()

df9 = df_states[(df_states['state_city'] != 'TOTAL (STATES)') & (df_states['state_city'] !='TOTAL (CITIES)')]
df_died = df9.groupby('state_city')['total_accidents_died'].sum().sort_values(ascending = False).head(10)
sns.barplot(x = df_died.values, y =df_died.index, palette = 'coolwarm')
plt.xlabel('Statea and UT')
plt.ylabel('Total Deaths')
plt.title('Number of Deaths by States')
plt.xticks(rotation = 90)
plt.grid(True)
plt.show()
plt.show()

📌 States/UTs-wise Total Number of Persons Killed in Road Accidents on State Highways from 2018 to 2021

import json
import plotly.express as px
import plotly.io as pio
pio.renderers.default = "notebook"
india_states = json.load(open("states_india.geojson", "r"))
state_id_map = {}
for feature in india_states["features"]:
    feature["id"] = feature["properties"]["state_code"]
    state_id_map[feature["properties"]["st_nm"]] = feature["id"]
df_s = pd.read_excel("state wise accident deaths.xlsx")
df_s["id"] = df_s["States/UTs"].apply(lambda x: state_id_map[x])
india_states['features'][1]['properties']
{'cartodb_id': 2, 'state_code': 35, 'st_nm': 'Andaman & Nicobar Island'}
df_s.head()
States/UTs Killed in Highway accident 2018 id
0 Andhra Pradesh 1897 28
1 Arunanchal Pradesh 58 12
2 Assam 681 18
3 Bihar 1474 10
4 Chhattisgarh 1068 22
fig = px.choropleth_mapbox(
    df_s,
    locations="id",
    geojson=india_states,
    color="Killed in Highway accident 2018",
    hover_name="States/UTs",
    hover_data=["Killed in Highway accident 2018"],
    title="States/UTs-wise Total Number of Persons Killed in Road Accidents on State Highways from 2018 to 2021",
    mapbox_style="carto-positron",
    center={"lat": 24, "lon": 78},
    zoom=3,
    opacity=0.5
)
fig.update_geos(fitbounds="locations", visible=False)
fig.show()

📌Worldwide Accidental Deaths(1990-2019)

import plotly.offline as py
df_nation = pd.read_csv('country_accidents.csv')
df_nation.head()
Entity Code Year Deaths Sidedness Historical_Population
0 Afghanistan AFG 1990 4154 0 12412311.0
1 Afghanistan AFG 1991 4472 0 13299016.0
2 Afghanistan AFG 1992 5106 0 14485543.0
3 Afghanistan AFG 1993 5681 0 15816601.0
4 Afghanistan AFG 1994 6001 0 17075728.0

Country wise total deaths

df_nation.groupby('Entity')['Deaths'].sum().sort_values(ascending = False).reset_index().head(10)
Entity Deaths
0 World 36317087
1 G20 23328740
2 Asia 21670793
3 World Bank Upper Middle Income 16041327
4 Middle SDI 13623644
5 East Asia & Pacific - World Bank region 13035092
6 World Bank Lower Middle Income 12599627
7 Southeast Asia, East Asia, and Oceania 12411258
8 Western Pacific Region 10454671
9 Commonwealth 8831208
df_nation['Entity'].nunique()
267
countries = np.unique(df_nation['Entity'])
data = [ dict(
        type = 'choropleth',
        locations = countries,
        z = df_nation['Deaths'],
        locationmode = 'country names',
        text = countries,
        marker = dict(
            line = dict(color = 'rgb(0,0,0)', width = 1)),
            colorbar = dict(autotick = True, tickprefix = '', 
            title = 'Total Deaths by Road Accident')
            )
       ]

layout = dict(
    title = 'Worldwide total number of deaths from road traffic incidents from 1990 to 2019',
    geo = dict(
        showframe = False,
        showocean = True,
        oceancolor = 'rgb(0,255,255)',
        projection = dict(
        type = 'orthographic',
            rotation = dict(
                    lon = 60,
                    lat = 10),
        ),
        lonaxis =  dict(
                showgrid = True,
                gridcolor = 'rgb(102, 102, 102)'
            ),
        lataxis = dict(
                showgrid = True,
                gridcolor = 'rgb(102, 102, 102)'
                )
            ),
        )

fig = dict(data=data, layout=layout)
py.iplot(fig, validate=False, filename='worldmap')

National and State-wise Comparison for India

National Trends:

At the national level, road accidents and fatalities may vary significantly due to differences in road infrastructure, traffic regulations, and enforcement practices. The national average provides a baseline for comparing individual states’ performance. Deaths and Accident Cases: The comparison of deaths and accident cases on a national level highlights regions with higher accident rates and fatalities. This can help prioritize areas for safety interventions and resource allocation. State-wise Insights:

High-Risk States:

States with higher numbers of road accidents and fatalities require targeted interventions, such as improved road safety measures, better enforcement of traffic laws, and public awareness campaigns. Effective States: States with lower accident rates and fatalities may have effective road safety policies and practices that could be modeled in other regions. Regional Differences: Differences in road infrastructure, urbanization levels, and traffic density among states can influence accident rates. Tailored approaches are necessary to address state-specific challenges.

📌 Descriptive Statistics

Correlation between variables

from sklearn.preprocessing import LabelEncoder

# Exclude unnecessary columns
columns_to_keep = ['Month', 'Junction_Control', 'Junction_Detail', 'Accident_Severity', 'Latitude',
                   'Light_Conditions', 'Local_Authority_(District)', 'Carriageway_Hazards', 'Longitude',
                   'Number_of_Casualties', 'Number_of_Vehicles', 'Police_Force', 'Road_Surface_Conditions',
                   'Road_Type', 'Speed_limit', 'Urban_or_Rural_Area', 'Weather_Conditions', 'Vehicle_Type']

df_corr = df[columns_to_keep]
# Initialize LabelEncoder
label_encoder = LabelEncoder()

# Iterate through each column
for col in df_corr.columns:
    # Check if the column is categorical
    if df_corr[col].dtype == 'object':
        # Apply LabelEncoder to perform categorical to numerical transformation
        df_corr[col] = label_encoder.fit_transform(df_corr[col].astype(str))

# Create a correlation matrix
corr_matrix = df_corr.corr()

corr_matrix
Month Junction_Control Junction_Detail Accident_Severity Latitude Light_Conditions Local_Authority_(District) Carriageway_Hazards Longitude Number_of_Casualties Number_of_Vehicles Police_Force Road_Surface_Conditions Road_Type Speed_limit Urban_or_Rural_Area Weather_Conditions Vehicle_Type
Month 1.000000 0.003259 0.010447 -0.000800 -0.007685 0.000165 -0.000996 0.000569 0.009110 -0.009141 0.006287 -0.002062 -0.032542 0.005644 -0.020367 0.019713 -0.021083 -0.000137
Junction_Control 0.003259 1.000000 0.288568 -0.000400 -0.070671 0.042531 -0.001454 0.008143 -0.033068 0.000306 0.034596 -0.105783 0.008309 0.101385 0.025817 -0.071934 -0.005122 -0.006543
Junction_Detail 0.010447 0.288568 1.000000 0.029895 -0.027699 0.005400 0.000790 0.036481 0.045583 -0.042448 0.025797 0.006225 -0.028606 0.093675 -0.135112 0.104192 -0.014495 0.002393
Accident_Severity -0.000800 -0.000400 0.029895 1.000000 -0.019656 0.025237 -0.005521 0.003894 0.000240 -0.075685 0.077072 0.003962 0.015108 -0.014130 -0.078333 0.079010 0.035499 -0.002342
Latitude -0.007685 -0.070671 -0.027699 -0.019656 1.000000 0.008491 -0.055101 -0.007792 -0.365601 0.041867 -0.029020 0.077707 0.077591 0.002057 0.055816 -0.054922 0.037806 0.012560
Light_Conditions 0.000165 0.042531 0.005400 0.025237 0.008491 1.000000 0.007916 0.010625 -0.032699 -0.013825 0.055153 -0.025724 -0.173002 0.022608 0.089403 -0.108488 -0.120927 -0.004599
Local_Authority_(District) -0.000996 -0.001454 0.000790 -0.005521 -0.055101 0.007916 1.000000 -0.001687 -0.009510 0.001880 0.007789 0.115593 -0.003998 -0.006231 0.041666 -0.052353 -0.000046 -0.005372
Carriageway_Hazards 0.000569 0.008143 0.036481 0.003894 -0.007792 0.010625 -0.001687 1.000000 0.002684 -0.003429 0.038238 0.006561 -0.015999 0.015012 -0.070474 0.069826 -0.004238 -0.000588
Longitude 0.009110 -0.033068 0.045583 0.000240 -0.365601 -0.032699 -0.009510 0.002684 1.000000 -0.050665 0.000713 0.108034 -0.055747 -0.003080 -0.046843 0.097935 -0.043172 -0.003716
Number_of_Casualties -0.009141 0.000306 -0.042448 -0.075685 0.041867 -0.013825 0.001880 -0.003429 -0.050665 1.000000 0.234499 0.005492 0.036684 -0.048326 0.137064 -0.112428 0.006589 -0.002334
Number_of_Vehicles 0.006287 0.034596 0.025797 0.077072 -0.029020 0.055153 0.007789 0.038238 0.000713 0.234499 1.000000 -0.015658 -0.014061 -0.096143 0.079861 -0.035670 -0.011944 -0.003861
Police_Force -0.002062 -0.105783 0.006225 0.003962 0.077707 -0.025724 0.115593 0.006561 0.108034 0.005492 -0.015658 1.000000 0.002484 -0.026724 -0.053232 0.093195 -0.013471 0.001302
Road_Surface_Conditions -0.032542 0.008309 -0.028606 0.015108 0.077591 -0.173002 -0.003998 -0.015999 -0.055747 0.036684 -0.014061 0.002484 1.000000 -0.009246 0.095619 -0.094933 0.507601 -0.001999
Road_Type 0.005644 0.101385 0.093675 -0.014130 0.002057 0.022608 -0.006231 0.015012 -0.003080 -0.048326 -0.096143 -0.026724 -0.009246 1.000000 -0.335244 0.087865 -0.002802 0.002544
Speed_limit -0.020367 0.025817 -0.135112 -0.078333 0.055816 0.089403 0.041666 -0.070474 -0.046843 0.137064 0.079861 -0.053232 0.095619 -0.335244 1.000000 -0.683529 0.031258 -0.002081
Urban_or_Rural_Area 0.019713 -0.071934 0.104192 0.079010 -0.054922 -0.108488 -0.052353 0.069826 0.097935 -0.112428 -0.035670 0.093195 -0.094933 0.087865 -0.683529 1.000000 -0.032542 0.005930
Weather_Conditions -0.021083 -0.005122 -0.014495 0.035499 0.037806 -0.120927 -0.000046 -0.004238 -0.043172 0.006589 -0.011944 -0.013471 0.507601 -0.002802 0.031258 -0.032542 1.000000 -0.000585
Vehicle_Type -0.000137 -0.006543 0.002393 -0.002342 0.012560 -0.004599 -0.005372 -0.000588 -0.003716 -0.002334 -0.003861 0.001302 -0.001999 0.002544 -0.002081 0.005930 -0.000585 1.000000

Heatmap

# Plotting the heatmap
plt.figure(figsize=(12, 12))
sns.heatmap(corr_matrix, annot=True, cmap='viridis', fmt='.2f', linewidth = 1 )
plt.title('Correlation Heatmap')
plt.show()

Insights and Analysis

a few insights that can be drawn based on the correlations between different variables:

  1. Month and Weather_Conditions: There is a negative correlation of -0.02 between Month and Weather_Conditions, indicating slight seasonal patterns in weather conditions. This could imply that certain months might experience more adverse weather than others.

  2. Junction_Control and Road_Type: There is a moderate positive correlation of 0.1 between Junction_Control and Road_Type. This suggests that the type of junction control (e.g., traffic signals, roundabouts) might be related to the type of road (e.g., single carriageway, dual carriageway).

  3. Accident_Severity and Number_of_Casualties: There is a moderate negative correlation of -0.08 between Accident_Severity and Number_of_Casualties. This could indicate that accidents categorized as more severe (e.g., fatal) tend to involve fewer casualties overall, possibly due to the extreme nature of such accidents.

  4. Latitude and Longitude: There is a negative correlation of -0.37 between Latitude and Longitude. This is expected, as locations further north typically have higher latitudes and lower longitudes, and vice versa. It’s important to note that this correlation is likely due to the geographical distribution of accidents rather than a causal relationship.

  5. Speed_limit and Road_Type: There is a strong negative correlation of -0.68 between Speed_limit and Road_Type. This suggests that different types of roads (e.g., motorways vs urban roads) have varying speed limits, which is expected but underscores the importance of road type in determining speed limits.

  6. Road_Surface_Conditions and Weather_Conditions: There is a moderate positive correlation of 0.51 between Road_Surface_Conditions and Weather_Conditions. This indicates that adverse weather conditions (e.g., rain, snow) are associated with poorer road surface conditions (e.g., wet, icy).

  7. Urban_or_Rural_Area and Speed_limit: There is a moderate negative correlation of -0.68 between Urban_or_Rural_Area and Speed_limit. This suggests that speed limits tend to be lower in urban areas compared to rural areas, possibly due to higher traffic density and safety considerations.

Conclusion

The analysis of road accident data for 2021-22 reveals significant insights into the factors influencing road safety, including weather conditions, road and junction types, geographical distribution, and speed limits. By understanding these correlations, policymakers and stakeholders can develop targeted strategies to mitigate road accidents and enhance safety. Additionally, the national and state-wise comparisons for India provide valuable information for identifying high-risk areas and implementing effective safety measures. Overall, a data-driven approach to road safety can help reduce accidents and save lives.